Skip to main content

Modules

Embeddings

class eole.modules.transformer_mlp.MLP(model_config, running_config=None, moe_transformer_ff=None)

Bases: Module

A two/three-layer Feed-Forward-Network.

  • Parameters:
    • model_config – eole.config.models.ModelConfig object
    • running_config – TrainingConfig or InferenceConfig derived from RunningConfig

gated_forward(x)

Layer definition. Legacy Gated operations. No fusion.

  • Parameters: x(batch_size, input_len, model_dim)
  • Returns: Output (batch_size, input_len, model_dim).
  • Return type: (FloatTensor)

simple_forward(x)

Layer definition.

  • Parameters: x(batch_size, input_len, model_dim)
  • Returns: Output (batch_size, input_len, model_dim).
  • Return type: (FloatTensor)

Encoders

class eole.encoders.encoder.EncoderBase(*args: Any, **kwargs: Any)

Bases: Module, ABC

Abstract base encoder class defining the interface for all encoders.

Used by: : - eole.Models.EncoderDecoderModel

  • eole.Models.EncoderModel

abstractmethod forward(emb: Tensor | list, pad_mask: Tensor | None = None, **kwargs) → Tuple[Tensor, Any | None]

Encode input embeddings or images.

  • Parameters:
    • emb – Input embeddings (batch, src_len, dim) for text encoders, or list of images for vision encoders
    • pad_mask – Padding mask (batch, src_len) for text encoders. False for actual values, True for padding. May be None for vision encoders.
    • **kwargs – Additional encoder-specific arguments
  • Returns:
    • enc_out: Encoder output for attention (batch, src_len, hidden_size)
    • enc_final_hs: Final hidden state or None For RNN: (num_layers * directions, batch, hidden_size) For LSTM: tuple of (hidden, cell) For Transformer/CNN/Vision: None
  • Return type: Tuple containing

update_dropout(dropout: float, attention_dropout: float | None = None) → None

Update dropout rates dynamically.

  • Parameters:
    • dropout – General dropout rate
    • attention_dropout – Attention-specific dropout rate (if applicable)

class eole.encoders.TransformerEncoder(encoder_config, running_config=None)

Bases: EncoderBase

Transformer encoder from ‘Attention is All You Need’.

Reference: : Vaswani et al. (2017) https://arxiv.org/abs/1706.03762

  • Parameters:
    • encoder_config – Complete encoder configuration
    • running_config – Runtime configuration (optional)

forward(emb: Tensor, pad_mask: Tensor | None = None, **kwargs) → Tuple[Tensor, None]

Encode input embeddings.

  • Parameters:
    • emb – Input embeddings with positional encodings Shape: (batch_size, src_len, model_dim)
    • pad_mask – Padding mask (batch, src_len) False for values, True for padding
    • **kwargs – Additional arguments (ignored)
  • Returns:
    • Encoded output (batch_size, src_len, model_dim)
    • None (transformers don’t return final state)
  • Return type: Tuple of
  • Raises: ValueError – If pad_mask is not provided

update_dropout(dropout: float, attention_dropout: float) → None

Update dropout rates for all transformer layers.

class eole.encoders.RNNEncoder(encoder_config, running_config=None)

Bases: EncoderBase

Generic recurrent neural network encoder supporting LSTM, GRU, and RNN.

  • Parameters:
    • encoder_config – Encoder configuration
    • running_config – Runtime configuration (optional)

forward(emb: Tensor, pad_mask: Tensor | None = None, **kwargs) → Tuple[Tensor, Tensor | Tuple[Tensor, Tensor]]

Encode input embeddings through RNN.

  • Parameters:
    • emb – Input embeddings (batch, src_len, dim)
    • pad_mask – Padding mask (optional, not used by base RNN)
    • **kwargs – Additional arguments
  • Returns:
    • RNN outputs (batch, src_len, hidden_size)
    • Final hidden state(s)
  • Return type: Tuple of

update_dropout(dropout: float, attention_dropout: float | None = None) → None

Update RNN dropout rate.

class eole.encoders.CNNEncoder(encoder_config, running_config=None)

Bases: EncoderBase

Convolutional sequence-to-sequence encoder.

Based on “Convolutional Sequence to Sequence Learning” (Gehring et al., 2017) Reference: https://arxiv.org/abs/1705.03122

  • Parameters:
    • encoder_config – Encoder configuration
    • running_config – Runtime configuration (optional)

forward(emb: Tensor, pad_mask: Tensor | None = None, **kwargs) → Tuple[Tensor, Tensor]

Encode input embeddings through CNN layers.

  • Parameters:
    • emb – Input embeddings (batch, src_len, dim)
    • pad_mask – Padding mask (optional, not used)
    • **kwargs – Additional arguments
  • Returns:
    • CNN output (batch, src_len, hidden_size)
    • Projected embeddings (batch, src_len, hidden_size)
  • Return type: Tuple of

update_dropout(dropout: float, attention_dropout: float | None = None) → None

Update CNN dropout rate.

class eole.encoders.MeanEncoder(encoder_config, running_config=None)

Bases: EncoderBase

Minimal encoder that applies mean pooling over the sequence.

Returns the input embeddings unchanged as encoder output, and provides mean-pooled representations as the final hidden state.

  • Parameters:
    • encoder_config – Encoder configuration
    • running_config – Runtime configuration (optional, unused)

forward(emb: Tensor, pad_mask: Tensor | None = None, **kwargs) → Tuple[Tensor, Tuple[Tensor, Tensor]]

Apply mean pooling over sequence dimension.

  • Parameters:
    • emb – Input embeddings (batch, seq_len, emb_dim)
    • pad_mask – Padding mask (batch, seq_len) False for values, True for padding
    • **kwargs – Additional arguments (ignored)
  • Returns:
    • Encoder output: unchanged input embeddings (batch, seq_len, emb_dim)
    • Final hidden state: tuple of (mean, mean) where mean has shape (num_layers, batch, emb_dim)
  • Return type: Tuple of

class eole.encoders.VisionEncoder(encoder_config, running_config=None)

Bases: EncoderBase

Vision encoder for processing images into token representations.

Supports various vision architectures:

  • CLIP-style with learned positional embeddings
  • Pixtral with RoPE 2D embeddings
  • SAM (Segment Anything Model) preprocessing
  • Parameters:
    • encoder_config – Vision encoder configuration
    • running_config – Runtime configuration (optional)

forward(emb: List[Tensor], pad_mask: Tensor | None = None, sam_patches: Tensor | None = None, **kwargs) → Tuple[Tensor, None]

Encode images into token representations.

  • Parameters:
    • emb – List of N images of variable sizes, each (C, H, W)
    • pad_mask – Not used for vision encoder (uses block diagonal masks)
    • sam_patches – Pre-computed SAM patches (optional)
    • **kwargs – Additional arguments
  • Returns:
    • Encoded image features (N_img, total_tokens, hidden_size)
    • None (vision encoders don’t return hidden states)
  • Return type: Tuple of

update_dropout(dropout: float, attention_dropout: float) → None

Update dropout rates for all transformer layers.

class eole.encoders.AudioEncoder(encoder_config, running_config=None)

Bases: EncoderBase

Audio encoder: Conv1d stem + learned positional embeddings + transformer layers.

Processes mel spectrograms into encoder hidden states for cross-attention with the decoder.

Input: mel spectrogram (batch, num_mels, time) Output: (batch, time // 2, hidden_size)

  • Parameters:
    • encoder_config – Audio encoder configuration
    • running_config – Runtime configuration (optional)

forward(emb: Tensor, pad_mask: Tensor | None = None, **kwargs) → Tuple[Tensor, None]

Encode mel spectrogram features.

  • Parameters:
    • emb – Mel spectrogram tensor (batch, num_mels, time)
    • pad_mask – Not used (fixed-length input)
    • **kwargs – Additional arguments (ignored)
  • Returns:
    • Encoded output (batch, time//2, hidden_size)
    • None (transformers don’t return final state)
  • Return type: Tuple of

update_dropout(dropout: float, attention_dropout: float) → None

Update dropout rates for all transformer layers.

Decoders

class eole.decoders.decoder.DecoderBase(attentional: bool = True)

Bases: Module, ABC

Abstract base class for decoders.

  • Parameters: attentional – Whether the decoder returns non-empty attention weights.

abstractmethod forward(emb: Tensor, enc_out: Tensor | None = None, step: int | None = None, **kwargs) → Tuple[Tensor, Dict[str, Tensor]]

Forward pass

  • Parameters:
    • emb (Tensor) – Target embeddings of shape (batch_size, tgt_len, hidden_size).
    • enc_out (Tensor , optional) – Encoder outputs of shape (batch_size, src_len, hidden_size). Required for encoder–decoder models, unused for language models.
    • step (int , optional) – Decoding step for incremental (autoregressive) decoding. None indicates full-sequence decoding.
    • **kwargs – Decoder-specific arguments (e.g. masks, alignment flags).
  • Returns:
    • dec_outs: : Decoder outputs of shape (batch_size, tgt_len, hidden_size).
    • attns: : > Dictionary of attention tensors.
      • attns["std"]: Standard attention of shape (batch_size, tgt_len, src_len), or None if the decoder has no attention.
  • Return type: (Tensor, Dict[str, Tensor])

abstractmethod init_state(**kwargs) → None

Initialize decoder state from encoder outputs.

  • Parameters: **kwargs – Initialization data. Common arguments include:
    • enc_out: Encoder output tensor
    • enc_final_hs: Encoder final hidden states
    • src: Source input tensor

abstractmethod map_state(fn: Callable[[Tensor], Tensor]) → None

Apply a function to all tensors in decoder state.

Used for operations like moving to device or selecting batch indices.

  • Parameters: fn – Function to apply to each tensor in the state.

update_dropout(dropout: float, attention_dropout: float | None = None) → None

Update dropout rates dynamically.

  • Parameters:
    • dropout – New dropout rate for standard dropout layers.
    • attention_dropout – New dropout rate for attention layers (if applicable).

class eole.decoders.TransformerDecoder(decoder_config, running_config=None)

Bases: DecoderBase

The Transformer encoder from “Attention is All You Need” []

  • Parameters:
    • decoder_config (eole.config.TransformerEncoderConfig) – full encoder config
    • running_config (TrainingConfig / InferenceConfig)

forward(emb, **kwargs)

Forward pass

  • Parameters:
    • emb (Tensor) – Target embeddings of shape (batch_size, tgt_len, hidden_size).
    • enc_out (Tensor , optional) – Encoder outputs of shape (batch_size, src_len, hidden_size). Required for encoder–decoder models, unused for language models.
    • step (int , optional) – Decoding step for incremental (autoregressive) decoding. None indicates full-sequence decoding.
    • **kwargs – Decoder-specific arguments (e.g. masks, alignment flags).
  • Returns:
    • dec_outs: : Decoder outputs of shape (batch_size, tgt_len, hidden_size).
    • attns: : > Dictionary of attention tensors.
      • attns["std"]: Standard attention of shape (batch_size, tgt_len, src_len), or None if the decoder has no attention.
  • Return type: (Tensor, Dict[str, Tensor])

init_state(**kwargs)

Initialize decoder state.

map_state(fn)

Apply a function to all tensors in decoder state.

Used for operations like moving to device or selecting batch indices.

  • Parameters: fn – Function to apply to each tensor in the state.

update_dropout(dropout, attention_dropout)

Update dropout rates dynamically.

  • Parameters:
    • dropout – New dropout rate for standard dropout layers.
    • attention_dropout – New dropout rate for attention layers (if applicable).

class eole.decoders.rnn_decoder.RNNDecoderBase(decoder_config, running_config=None)

Bases: DecoderBase

Base class for recurrent neural network decoders.

Implements common logic for:

  • state initialization and mapping
  • incremental decoding
  • attention handling
  • output and attention normalization

Subclasses must implement _run_forward_pass.

forward(emb: Tensor, enc_out: Tensor | None = None, step: int | None = None, **kwargs) → Tuple[Tensor, Dict[str, Tensor]]

Decode a full sequence or a single incremental step.

  • Parameters:
    • emb (Tensor) – Target embeddings of shape (batch_size, tgt_len, hidden_size).
    • enc_out (Tensor) – Encoder outputs of shape (batch_size, src_len, hidden_size).
    • step (int , optional) – Decoding step for incremental decoding.
    • **kwargs – Additional decoder-specific arguments.
  • Returns:
    • dec_outs: : Decoder outputs of shape (batch_size, tgt_len, hidden_size).
    • attns: : Dictionary of attention tensors.
      • attns["std"]: Attention weights of shape (batch_size, tgt_len, src_len).
  • Return type: (Tensor, Dict[str, Tensor])

init_state(**kwargs)

Initialize decoder state with last state of the encoder.

map_state(fn)

Apply a function to all tensors in decoder state.

Used for operations like moving to device or selecting batch indices.

  • Parameters: fn – Function to apply to each tensor in the state.

update_dropout(dropout, attention_dropout=None)

Update dropout rates dynamically.

  • Parameters:
    • dropout – New dropout rate for standard dropout layers.
    • attention_dropout – New dropout rate for attention layers (if applicable).

class eole.decoders.StdRNNDecoder(decoder_config, running_config=None)

Bases: RNNDecoderBase

Standard fully batched RNN decoder with attention.

Faster implementation, uses CuDNN for implementation. See RNNDecoderBase for options.

Based around the approach from “Neural Machine Translation By Jointly Learning To Align and Translate” []

Implemented without input_feeding and currently with no coverage_attn

class eole.decoders.InputFeedRNNDecoder(decoder_config, running_config=None)

Bases: RNNDecoderBase

Input feeding based decoder.

See RNNDecoderBase for options.

Based around the input feeding approach from “Effective Approaches to Attention-based Neural Machine Translation” []

update_dropout(dropout, attention_dropout=None)

Update dropout rates dynamically.

  • Parameters:
    • dropout – New dropout rate for standard dropout layers.
    • attention_dropout – New dropout rate for attention layers (if applicable).

class eole.decoders.CNNDecoder(decoder_config, running_config=None)

Bases: DecoderBase

Convolutional CNN Decoder with multi-step attention.

forward(emb: Tensor, enc_out: Tensor | None = None, step: int | None = None, **kwargs) → Tuple[Tensor, Dict[str, Tensor]]

Decode a sequence using convolutional layers.

  • Parameters:
    • emb (Tensor) – Target embeddings of shape (batch_size, tgt_len, hidden_size).
    • enc_out (Tensor) – Encoder outputs of shape (batch_size, src_len, hidden_size).
    • step (int , optional) – Decoding step for incremental decoding.
    • **kwargs – Additional decoder-specific arguments.
  • Returns:
    • dec_outs: : Decoder outputs of shape (batch_size, tgt_len, hidden_size).
    • attns: : Dictionary of attention tensors.
      • attns["std"]: Attention weights of shape (batch_size, tgt_len, src_len).
  • Return type: (Tensor, Dict[str, Tensor])

init_state(**kwargs)

Initialize decoder state.

map_state(fn)

Apply a function to all tensors in decoder state.

Used for operations like moving to device or selecting batch indices.

  • Parameters: fn – Function to apply to each tensor in the state.

update_dropout(dropout: float, attention_dropout: float | None = None) → None

Update dropout rates for all convolutional layers.

Attention